R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

Title: Exploring the Neural Activity of Mice in Response to Stimuli

Introduction: Background of dataset: This dataset is from an experiment featuring 4 mice named Cori, Forssman, Hench, and Lederberg. Cori is the subject of sessions 1-3, Forssman is the subject of sessions 4-7, Hench is the subject of sessions 8-12, and Lederberg is the subject of sessions 13-18. Each session has around 100-500 trials. In each trial, our mouse subject is treated with visual stimuli, specifically a screen showing visual stimulation content each with instensity varying from 0,0.25,0.5, and 1 on its left and right.

When the left stimuli is a stronger stimulation value, a successful task for the mouse would be to turn its wheel to the left. On the other hand, when the right stimuli is a stronger stimulation value, a successful task for the mouse would be to turn its wheel to the right. When both left and right stimulations had equal intensity, the correct wheel turning direction was randomized. In each trial, the mouse was rewarded for a successful task and penalized for a failed task.

Scientists monitored the neural activities of each mice throughout the experiment, specifically the spikes of stimulation occurring in their neurons. Scientists recorded the spiking trains of each reacting neuron 0.4 seconds post-onset, and where each neuron was located in the brain.

Objective: The objective of our project is to recognize the patterns of the mice’s neural activities in our dataset, and utilize those patterns to make a prediction model for the success/failure rate of a random set of trials.

Exploratory analysis: Fist we load the packages required to run our code.

options(repos = list(CRAN="http://cran.rstudio.com/"))
install.packages("devtools")
## 
## The downloaded binary packages are in
##  /var/folders/rj/w7fkpmzd4pn8vypqdddtr01m0000gn/T//Rtmp9FGMOz/downloaded_packages
knitr::opts_chunk$set(eval = TRUE, echo = TRUE, fig.align='center')
install.packages("tidyverse")
## 
## The downloaded binary packages are in
##  /var/folders/rj/w7fkpmzd4pn8vypqdddtr01m0000gn/T//Rtmp9FGMOz/downloaded_packages
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
install.packages("knitr")
## 
## The downloaded binary packages are in
##  /var/folders/rj/w7fkpmzd4pn8vypqdddtr01m0000gn/T//Rtmp9FGMOz/downloaded_packages
library(knitr)
install.packages("dplyr")
## 
## The downloaded binary packages are in
##  /var/folders/rj/w7fkpmzd4pn8vypqdddtr01m0000gn/T//Rtmp9FGMOz/downloaded_packages
library(dplyr)
install.packages("kableExtra")
## 
## The downloaded binary packages are in
##  /var/folders/rj/w7fkpmzd4pn8vypqdddtr01m0000gn/T//Rtmp9FGMOz/downloaded_packages
install.packages("rvest")
## 
## The downloaded binary packages are in
##  /var/folders/rj/w7fkpmzd4pn8vypqdddtr01m0000gn/T//Rtmp9FGMOz/downloaded_packages
library(kableExtra)
## Warning in !is.null(rmarkdown::metadata$output) && rmarkdown::metadata$output
## %in% : 'length(x) = 2 > 1' in coercion to 'logical(1)'
## 
## Attaching package: 'kableExtra'
## 
## The following object is masked from 'package:dplyr':
## 
##     group_rows
library(rvest)
## 
## Attaching package: 'rvest'
## 
## The following object is masked from 'package:readr':
## 
##     guess_encoding
  1. describe the data structures across sessions (e.g., number of neurons, number of trials, stimuli conditions, feedback types):

First, I separated the dataset into our sessions by making a list with each component containing data from each of the 18 sessions.

session=list()
for(i in 1:18){
  session[[i]]=readRDS(paste('~/Downloads/sessions/session',i,'.rds',sep=''))
  }
 names(session[[i]])
## [1] "contrast_left"  "contrast_right" "feedback_type"  "mouse_name"    
## [5] "brain_area"     "date_exp"       "spks"           "time"

By using the function names(), we can see that there are 8 different variables in each session as shown above.

I summarized the information across sessions by labeling each session with the mouse that was treated, the date of the session, labeling these variables as mouse_name and date_exp. We added up the number of reactive neurons in each trial as well as the trial count, name these variables n_neurons and n_trials.

Finally,I found the success rate by finding the average feedback type value, while modifying the failure value of the feedback_type from -1 to 0. This way, the success rate would be a decimal between 0 and 1 and not -1 and 1.

# Summarize the information across sessions:


# Knowing what summary we want to report, we can create a tibble:
# All values in this function serve only as place holders

n.session=length(session)

# in library tidyverse
meta <- tibble(
  mouse_name = rep('name',n.session),
  date_exp =rep('dt',n.session),
  n_neurons = rep(0,n.session),
  n_trials = rep(0,n.session),
  success_rate = rep(0,n.session)
)


for(i in 1:n.session){
  tmp = session[[i]]
  meta[i,1]=tmp$mouse_name;
  meta[i,2]=tmp$date_exp;
  meta[i,3]=dim(tmp$spks[[1]])[1];
  meta[i,4]=length(tmp$feedback_type);
  meta[i,5]=mean(tmp$feedback_type+1)/2;
  
  
}

I made a table of our new variables using the function kable().

# In package knitr
mytable<-kable(meta, format = "html", table.attr = "class='table table-striped'",digits=2) %>%
              kable_styling()
mytable
mouse_name date_exp n_neurons n_trials success_rate
Cori 2016-12-14 734 114 0.61
Cori 2016-12-17 1070 251 0.63
Cori 2016-12-18 619 228 0.66
Forssmann 2017-11-01 1769 249 0.67
Forssmann 2017-11-02 1077 254 0.66
Forssmann 2017-11-04 1169 290 0.74
Forssmann 2017-11-05 584 252 0.67
Hench 2017-06-15 1157 250 0.64
Hench 2017-06-16 788 372 0.69
Hench 2017-06-17 1172 447 0.62
Hench 2017-06-18 857 342 0.80
Lederberg 2017-12-05 698 340 0.74
Lederberg 2017-12-06 983 300 0.80
Lederberg 2017-12-07 756 268 0.69
Lederberg 2017-12-08 743 404 0.76
Lederberg 2017-12-09 474 280 0.72
Lederberg 2017-12-10 565 224 0.83
Lederberg 2017-12-11 1090 216 0.81
# Convert 'meta' dataframe to an HTML table using kable()
mytable <- kable(meta, format = "html", table.attr = "class='table table-striped'", digits = 2) %>%
              kable_styling()

# Print the table
print(mytable)
## <table class="table table-striped table" style="margin-left: auto; margin-right: auto;">
##  <thead>
##   <tr>
##    <th style="text-align:left;"> mouse_name </th>
##    <th style="text-align:left;"> date_exp </th>
##    <th style="text-align:right;"> n_neurons </th>
##    <th style="text-align:right;"> n_trials </th>
##    <th style="text-align:right;"> success_rate </th>
##   </tr>
##  </thead>
## <tbody>
##   <tr>
##    <td style="text-align:left;"> Cori </td>
##    <td style="text-align:left;"> 2016-12-14 </td>
##    <td style="text-align:right;"> 734 </td>
##    <td style="text-align:right;"> 114 </td>
##    <td style="text-align:right;"> 0.61 </td>
##   </tr>
##   <tr>
##    <td style="text-align:left;"> Cori </td>
##    <td style="text-align:left;"> 2016-12-17 </td>
##    <td style="text-align:right;"> 1070 </td>
##    <td style="text-align:right;"> 251 </td>
##    <td style="text-align:right;"> 0.63 </td>
##   </tr>
##   <tr>
##    <td style="text-align:left;"> Cori </td>
##    <td style="text-align:left;"> 2016-12-18 </td>
##    <td style="text-align:right;"> 619 </td>
##    <td style="text-align:right;"> 228 </td>
##    <td style="text-align:right;"> 0.66 </td>
##   </tr>
##   <tr>
##    <td style="text-align:left;"> Forssmann </td>
##    <td style="text-align:left;"> 2017-11-01 </td>
##    <td style="text-align:right;"> 1769 </td>
##    <td style="text-align:right;"> 249 </td>
##    <td style="text-align:right;"> 0.67 </td>
##   </tr>
##   <tr>
##    <td style="text-align:left;"> Forssmann </td>
##    <td style="text-align:left;"> 2017-11-02 </td>
##    <td style="text-align:right;"> 1077 </td>
##    <td style="text-align:right;"> 254 </td>
##    <td style="text-align:right;"> 0.66 </td>
##   </tr>
##   <tr>
##    <td style="text-align:left;"> Forssmann </td>
##    <td style="text-align:left;"> 2017-11-04 </td>
##    <td style="text-align:right;"> 1169 </td>
##    <td style="text-align:right;"> 290 </td>
##    <td style="text-align:right;"> 0.74 </td>
##   </tr>
##   <tr>
##    <td style="text-align:left;"> Forssmann </td>
##    <td style="text-align:left;"> 2017-11-05 </td>
##    <td style="text-align:right;"> 584 </td>
##    <td style="text-align:right;"> 252 </td>
##    <td style="text-align:right;"> 0.67 </td>
##   </tr>
##   <tr>
##    <td style="text-align:left;"> Hench </td>
##    <td style="text-align:left;"> 2017-06-15 </td>
##    <td style="text-align:right;"> 1157 </td>
##    <td style="text-align:right;"> 250 </td>
##    <td style="text-align:right;"> 0.64 </td>
##   </tr>
##   <tr>
##    <td style="text-align:left;"> Hench </td>
##    <td style="text-align:left;"> 2017-06-16 </td>
##    <td style="text-align:right;"> 788 </td>
##    <td style="text-align:right;"> 372 </td>
##    <td style="text-align:right;"> 0.69 </td>
##   </tr>
##   <tr>
##    <td style="text-align:left;"> Hench </td>
##    <td style="text-align:left;"> 2017-06-17 </td>
##    <td style="text-align:right;"> 1172 </td>
##    <td style="text-align:right;"> 447 </td>
##    <td style="text-align:right;"> 0.62 </td>
##   </tr>
##   <tr>
##    <td style="text-align:left;"> Hench </td>
##    <td style="text-align:left;"> 2017-06-18 </td>
##    <td style="text-align:right;"> 857 </td>
##    <td style="text-align:right;"> 342 </td>
##    <td style="text-align:right;"> 0.80 </td>
##   </tr>
##   <tr>
##    <td style="text-align:left;"> Lederberg </td>
##    <td style="text-align:left;"> 2017-12-05 </td>
##    <td style="text-align:right;"> 698 </td>
##    <td style="text-align:right;"> 340 </td>
##    <td style="text-align:right;"> 0.74 </td>
##   </tr>
##   <tr>
##    <td style="text-align:left;"> Lederberg </td>
##    <td style="text-align:left;"> 2017-12-06 </td>
##    <td style="text-align:right;"> 983 </td>
##    <td style="text-align:right;"> 300 </td>
##    <td style="text-align:right;"> 0.80 </td>
##   </tr>
##   <tr>
##    <td style="text-align:left;"> Lederberg </td>
##    <td style="text-align:left;"> 2017-12-07 </td>
##    <td style="text-align:right;"> 756 </td>
##    <td style="text-align:right;"> 268 </td>
##    <td style="text-align:right;"> 0.69 </td>
##   </tr>
##   <tr>
##    <td style="text-align:left;"> Lederberg </td>
##    <td style="text-align:left;"> 2017-12-08 </td>
##    <td style="text-align:right;"> 743 </td>
##    <td style="text-align:right;"> 404 </td>
##    <td style="text-align:right;"> 0.76 </td>
##   </tr>
##   <tr>
##    <td style="text-align:left;"> Lederberg </td>
##    <td style="text-align:left;"> 2017-12-09 </td>
##    <td style="text-align:right;"> 474 </td>
##    <td style="text-align:right;"> 280 </td>
##    <td style="text-align:right;"> 0.72 </td>
##   </tr>
##   <tr>
##    <td style="text-align:left;"> Lederberg </td>
##    <td style="text-align:left;"> 2017-12-10 </td>
##    <td style="text-align:right;"> 565 </td>
##    <td style="text-align:right;"> 224 </td>
##    <td style="text-align:right;"> 0.83 </td>
##   </tr>
##   <tr>
##    <td style="text-align:left;"> Lederberg </td>
##    <td style="text-align:left;"> 2017-12-11 </td>
##    <td style="text-align:right;"> 1090 </td>
##    <td style="text-align:right;"> 216 </td>
##    <td style="text-align:right;"> 0.81 </td>
##   </tr>
## </tbody>
## </table>

The variable in this summary data frame that I predicted might have the strongest relationship with the success rate of the trials (a variable summarizing the feedback type variable that we will build a prediction model for) is n_neurons, or the number of active neurons in the mouse’s brain.I want to first examine whether the number of active neurons in each trial is related to the success rate by making a line graph of the number of active neurons vs the success rate.

# Create a new dataframe for the scatter plot data
scatter_data <- meta %>%
  select(n_neurons, success_rate)

# Create a scatter plot using ggplot2
scatter_plot <- ggplot(scatter_data, aes(x = n_neurons, y = success_rate)) +
  geom_point() +
  xlab("Number of Active Neurons") +
  ylab("Success Rate") +
  ggtitle("Number of Active Neurons vs Success Rate")

# Print the scatter plot
print(scatter_plot)

#add trend line

There seems to be a slight negative relationship between the number of active neurons and the success rate of the session, but the relationship between the two variables are not very strong. Digging deeper in later code, I would like to know which neurons in which area of the brain have the strongest connection between the success rate of the session. Maybe neurons in certain areas of the brain are activated when the brain is distracted, which would make sense if higher activity of neurons in those areas of the brain indicated lower success rates.

  1. explore the neural activities during each trial As a major component in my prediction model, I would like to take a look at the neuron activity, specifically the neuron spiking activity that occurred in different areas of the mice’s brain.

We make a plot visualizing where the neuron spikes occurred in each area of the brain in the 0.4 seconds post onset for the first 4 trials each in session 1 and session 18, where the test data will come from. This is useful in examining which areas of the brain had more spiked neurons post onset, and whether some neurons in certain areas of the brain spiked earlier/later post onset.

i.s=1 # indicator for this session

i.t=1 # indicator for this trial 
area.col <- c("red", "blue", "green","orange","purple")


plot.trial<-function(i.t, area,this_session){
    
    spks=this_session$spks[[i.t]];
    n.neuron=dim(spks)[1]
    time.points=this_session$time[[i.t]]
    area=unique(this_session$brain_area)
    
    plot(0,0,xlim=c(min(time.points),max(time.points)),ylim=c(0,n.neuron+1),col='white', xlab='Time (s)',yaxt='n', ylab='Neuron', main=paste('Trial ',i.t, 'feedback', this_session$feedback_type[i.t] ),cex.lab=1.5)
    for(i in 1:n.neuron){
        i.a=which(area== this_session$brain_area[i]);
        col.this=area.col[i.a]
        
        ids.spike=which(spks[i,]>0) # find out when there are spikes 
        if( length(ids.spike)>0 ){
            points(x=time.points[ids.spike],y=rep(i, length(ids.spike) ),pch='.',cex=10, col=col.this)
        }
      
            
    }
    
legend("topright", 
  legend = area, 
  col = area.col, 
  pch = 16, 
  cex = 0.8
  )
  }
#first 4 trials of session 1
plot.trial(1,area,session[[1]])

plot.trial(2,area,session[[1]])

plot.trial(3,area,session[[1]])

plot.trial(4,area,session[[1]])

Inspecting the first 4 trials of session 1, we can see that the successful trials were trial 1 and trial 2 (feedback type 1) while trial 3 and trial 4 were the failed trials. Looking for differences in the neuron activities between the successful and failed trials, I can see that the successful trials have barely any neuron spikes in the root area of the brain, seen through the absense of orange squares in the first 2 plots. On the other hand, the failed trials have slightly more neuron spikes in the root area.

#first 4 trials of session 18
plot.trial(1,area,session[[18]])

plot.trial(2,area,session[[18]])

plot.trial(3,area,session[[18]])

plot.trial(4,area,session[[18]])

Inspecting the first 4 trials of session 18, we can see that the successful trials were trial 1 and trial 2 (feedback type 1) while trial 3 and trial 4 were the failed trials once again. Looking for differences in the neuron activities between the successful and failed trials, I can see that the failed trials (trials 3 and 4) have slightly more neuron spikes in the root area, but more specifically in the later timespan of the 0.4 seconds post onset. We can see this through the denser distribution of orange squares in the 87-87.2 second timespan in trial 3, and 92.7-93 second timespan in trial 4.

To dig deeper in the activity of neurons in specific areas of the brain, we make a function called average_spike_area calculating the average number of neuron spikes in each area of the mouse’s brain in each trial.

average_spike_area<-function(i.t,this_session){

spk.trial = this_session$spks[[i.t]]
area= this_session$brain_area
spk.count=apply(spk.trial,1,sum)
spk.average.tapply=tapply(spk.count, area, mean)
return(spk.average.tapply)
}

Now we create a data frame that contains the average spike counts for each area, feedback type, the two contrasts, and the trial id for session 1. This will be useful in examining the relationship between average spike counts in certain brain areas with the value of the left and right contrasts, as well as our predictor variable (feedback type).

n.trial=length(session[[i.s]]$feedback_type)
n.area=length(unique(session[[i.s]]$brain_area ))
# Alternatively, you can extract these information in the meta that we created before.

# We will create a data frame that contain the average spike counts for each area, feedback type,  the two contrasts, and the trial id

trial.summary =matrix(nrow=n.trial,ncol= n.area+1+2+1)


for(i.t in 1:n.trial){
  trial.summary[i.t,]=c(average_spike_area(i.t,this_session = session[[i.s]]),session[[i.s]]$feedback_type[i.t],session[[i.s]]$contrast_left[i.t],session[[i.s]]$contrast_left[i.s],i.t)

}
colnames(trial.summary)=c(names(average_spike_area(i.t,this_session = session[[i.s]])), 'feedback', 'left contr.','right contr.','id' )

# Turning it into a data frame
trial.summary <- as_tibble(trial.summary)

I will now inspect the first 20 trials of session 1 (session from our test data) to speculate any possible relationships between average spike values in certain brain areas with left contrasts.

head(trial.summary,10)
## # A tibble: 10 × 12
##      ACA   CA3    DG    LS   MOs  root   SUB  VISp feedback `left contr.`
##    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>         <dbl>
##  1 1.27   1.53  2.32 1.94  0.938 0.5    2.08  1.69        1           0  
##  2 0.587  1.71  1.56 1.37  0.779 0.667  1.95  1.65        1           0  
##  3 1.19   1.99  3.41 1.73  0.920 1.44   2.64  2.28       -1           0.5
##  4 0.514  2.19  2.21 1.06  0.708 1.06   2.8   1.56       -1           0  
##  5 0.771  1.81  2.06 1.30  0.788 0.778  2.61  1.62       -1           0  
##  6 0.459  1.85  1.06 0.906 0.735 0.611  1.91  1.28        1           0  
##  7 1.22   2.53  2.88 1.86  0.929 2.22   3.32  2.74        1           1  
##  8 1.06   1.97  2    1.47  0.779 1.17   2.89  2.26        1           0.5
##  9 0.734  2.21  1.97 1.32  0.655 0.944  2.97  1.76        1           0  
## 10 1.30   2.16  2.41 1.86  1.03  1.39   2.08  1.93        1           0.5
## # ℹ 2 more variables: `right contr.` <dbl>, id <dbl>

#make a comment about root areas with left/right contrast.

Also, the first 20 trials of session 1 only have 6 failed trials (feedback of -1), which makes me wonder whether how the high success rate of these trials as anything to do with the right contrast having a value of 0. As a possible explanation, maybe Cori (subject mouse of session 1) is more keen to stimuli from the left side.

I would like to investigate these patterns between contrasts and neural activity in different areas of the brain further in my data integration portion.

  1. explore the changes across trials

We now create a visualization of the average spikes in each area of the brain arcoss trials session 1, the first half of the test data set.

area.col = rainbow(n = n.area, alpha = 0.7)
plot(0 ~ 1, col = 'white', xlim = c(0, 115), ylim = c(0, 5), xlab = "Trials", ylab = "Average spike counts", main = paste("Spikes per area in Session", 1))

for (i in 1:n.area) {
  lines(y = trial.summary[[i]], x = trial.summary$id, col = area.col[i], lty = 2, lwd = 1)
  lines(smooth.spline(trial.summary$id, trial.summary[[i]]), col = area.col[i], lwd = 3)
}

legend("topright", 
  legend = colnames(trial.summary)[1:n.area], 
  col = area.col, 
  lty = 1, 
  cex = 0.8
)

I see that the neurons in multiple areas of the brain, particularly SUB, VISp, DG, and MOs are relatively constant with a slight downward slope as the trials go on. This tells us that the neurons in these areas of the brain get slightly less active as the trials go on, which could indicate that as the experiment subject, Cori becomes fatigued through the long repetition of trials. The neuron activity in these areas could be the most correspondent to Cori’s fatigue. Also, the relative constantness of the neuron spikes counts across all areas of the brain may indicate that Cori may have a calmer, more relaxed mind with less sensitivity to stimuli.

Data Integration

After speculating Cori’s relative constant trend in neuron spikes and tendency to fatigue as trials go on, I would like to determine which mice share a similar tendency of neuron activity with Cori, and which mice share a similar tendency of neuron activity with Lederberg (subject in other half of test data, session 18). This is because I feel that it is logical to assume that mice with similar neuron activity share similar tendencies as well as variability in data. My approach is also conducted under the assumption that sessions done by the same mouse will have similar variability as well as success rates, as it is the same brain and body that is being experimented on.

  1. explore homogeneity and heterogeneity across sessions and mice Mice and their sessions: Cori: sessions 1-3 (test data-session1) Forssman:sessions 4-7 Hench:sessions 8-12 Lederberg:sessions 13-18 (test data-session18)

I wrapped my earlier code find average spikes in each area of the brain across sessions into a function so that I can apply them to different sessions in our experiment.

generate_trial_summary <- function(session_data) {
  n.trial <- length(session_data$feedback_type)
  n.area <- length(unique(session_data$brain_area))
  
  trial.summary <- matrix(nrow = n.trial, ncol = n.area + 1 + 2 + 1)
  
  for (i.t in 1:n.trial) {
    trial.summary[i.t, ] <- c(average_spike_area(i.t, this_session = session_data),
                              session_data$feedback_type[i.t],
                              session_data$contrast_left[i.t],
                              session_data$contrast_right[i.t],
                              i.t)
  }
  
  colnames(trial.summary) <- c(names(average_spike_area(i.t, this_session = session_data)),
                               "feedback",
                               "left contr.",
                               "right contr.",
                               "id")
  
  trial_summary <- as_tibble(trial.summary)
  
  return(trial_summary)
}

I took one session that each of the 4 mice were experimented on as the representative session for each mouse. I applied my generate_trial_summary function to each of the 4 mice’s representative sessions.

#Cori -session 1 (session from test data)
trial_summary_session1 <- as.data.frame(generate_trial_summary(session[[1]]))  # Generate trial summary for Session 1
head(trial_summary_session1)
##         ACA      CA3       DG        LS       MOs      root      SUB     VISp
## 1 1.2660550 1.529412 2.323529 1.9352518 0.9380531 0.5000000 2.080000 1.685393
## 2 0.5871560 1.705882 1.558824 1.3669065 0.7787611 0.6666667 1.946667 1.651685
## 3 1.1926606 1.985294 3.411765 1.7266187 0.9203540 1.4444444 2.640000 2.275281
## 4 0.5137615 2.191176 2.205882 1.0575540 0.7079646 1.0555556 2.800000 1.561798
## 5 0.7706422 1.808824 2.058824 1.3021583 0.7876106 0.7777778 2.613333 1.623596
## 6 0.4587156 1.852941 1.058824 0.9064748 0.7345133 0.6111111 1.906667 1.280899
##   feedback left contr. right contr. id
## 1        1         0.0          0.5  1
## 2        1         0.0          0.0  2
## 3       -1         0.5          1.0  3
## 4       -1         0.0          0.0  4
## 5       -1         0.0          0.0  5
## 6        1         0.0          0.0  6
#Forssman
trial_summary_session6 <- as.data.frame(generate_trial_summary(session[[6]]))  # Generate trial summary for Session 1
head(trial_summary_session6)
##         AUD      CA1      root       SSp        TH feedback left contr.
## 1 0.3170732 1.111111 0.7484076 0.2727273 0.4153226        1        0.25
## 2 0.3658537 1.388889 0.7707006 0.6363636 0.3508065        1        0.00
## 3 0.5000000 1.416667 0.9363057 0.4545455 0.3991935        1        1.00
## 4 0.4430894 1.361111 0.9713376 0.5454545 0.3145161        1        0.00
## 5 0.4105691 1.083333 0.6942675 0.1818182 0.3387097       -1        0.00
## 6 0.4918699 1.555556 0.7340764 0.1818182 0.4838710        1        0.00
##   right contr. id
## 1          0.0  1
## 2          0.0  2
## 3          0.0  3
## 4          0.5  4
## 5          0.0  5
## 6          0.0  6
#Hench
trial_summary_session10 <- as.data.frame(generate_trial_summary(session[[10]]))  # Generate trial summary for Session 1
head(trial_summary_session10)
##         CA1        DG       GPe       MB       MRN      POL      POST     root
## 1 0.8157895 0.8356164 0.7301587 1.570909 0.8333333 1.311688 0.9268293 1.833333
## 2 0.6578947 0.6986301 0.4603175 1.290909 0.8888889 1.116883 0.9512195 3.638889
## 3 0.8552632 0.9315068 0.7936508 1.741818 0.7638889 1.071429 0.9756098 2.027778
## 4 0.9078947 0.9315068 0.7619048 1.301818 0.9583333 1.396104 0.6341463 2.250000
## 5 0.8157895 0.7123288 0.6984127 1.578182 0.7083333 1.376623 0.7317073 1.250000
## 6 0.5921053 0.7260274 0.6349206 1.472727 0.8194444 1.227273 1.0731707 1.666667
##         SCm      SCsg     VISl      VISp     VISrl feedback left contr.
## 1 1.6521739 0.2631579 1.535354 0.8761905 0.4558824       -1           0
## 2 0.6086957 0.1578947 1.595960 0.6952381 0.2647059       -1           0
## 3 0.9130435 0.3684211 1.424242 1.2285714 0.5441176       -1           0
## 4 0.8260870 0.1578947 1.555556 1.0285714 0.4411765       -1           0
## 5 0.7391304 0.4736842 1.444444 0.9809524 0.4264706       -1           0
## 6 0.8260870 0.4210526 1.454545 0.9428571 0.3235294        1           0
##   right contr. id
## 1            1  1
## 2            1  2
## 3            1  3
## 4            1  4
## 5            1  5
## 6            1  6
#Lederberg  (session from test data)
trial_summary_session18 <- as.data.frame(generate_trial_summary(session[[18]]))   # Generate trial summary for Session 1
head(trial_summary_session18)
##         ACB       CA3        CP       LGd        OT     root       SI       SNr
## 1 1.2709677 1.6285714 1.1329114 0.7865169 0.8461538 2.080808 1.854167 0.8000000
## 2 1.1612903 1.4000000 1.1202532 1.0449438 0.5769231 1.858586 2.187500 1.0615385
## 3 0.7225806 1.4285714 0.9430380 1.2471910 0.5384615 1.919192 1.520833 0.9230769
## 4 0.8903226 0.8285714 0.6518987 1.1573034 0.4230769 1.646465 1.812500 0.8153846
## 5 0.7290323 1.2571429 0.5822785 0.9887640 0.3461538 2.363636 1.666667 0.8846154
## 6 0.9806452 1.4285714 0.8101266 0.7191011 0.8076923 1.939394 1.958333 0.8230769
##         TH        ZI feedback left contr. right contr. id
## 1 1.057143 1.1257143        1         0.5         0.00  1
## 2 1.411429 1.4114286        1         1.0         0.25  2
## 3 1.240000 1.2285714       -1         0.0         0.00  3
## 4 1.022857 1.1828571       -1         0.0         0.00  4
## 5 1.360000 1.3771429        1         0.0         0.00  5
## 6 1.251429 0.9314286        1         0.5         0.00  6

Looking at the first 6 trials of the summaries for each of the 4 mice’s representative sessions, I noticed that all of Hench and Lederberg’s average spike value for the root area of the brain were over 1. On the other hand, less than half of Cori and Forsmann’s trials had an average spike value over 1.

ncol(trial_summary_session1)
## [1] 12
ncol(trial_summary_session6)
## [1] 9
ncol(trial_summary_session10)
## [1] 17
ncol(trial_summary_session18)
## [1] 14

Through the function ncol(), we can also see that Cori and Forssmann have less areas of the brain that show neuron activity than Hench and Lederberg.

Exploring the relationship between left/right contrast and feedback types: I decided to find whether the 4 mice had certain sensitivites/preferences to stimuli on one side (left or right), or tend to be more successful with a certain combination of left and right stimuli values. I assigned left and right contrast values to the x and y axis, respectively.I then colored coded the dots in each trial so that failed trials (feedback -1) would appear darker and successful trials (feedback 1) would appear lighter.

generate_dotplot <- function(session_number, name, size = 5, alpha = 0.5, xlim = c(0, 1.2), ylim = c(0, 1.2), colors = c("-1" = "black", "1" = "white")) {
  # Create the dot plot for the session
  dotplot <- ggplot(generate_trial_summary(session[[session_number]]), aes(x = `left contr.`, y = `right contr.`, color = factor(feedback))) +
    geom_point(size = size, alpha = alpha) +
    labs(x = "Left Contrast", y = "Right Contrast", color = "Feedback Type") +
    scale_color_manual(values = colors) +
    coord_cartesian(xlim = xlim, ylim = ylim) +
    ggtitle(paste("Session", session_number,"-", name))
  
  return(dotplot)
}

Using this function on the 4 mice with their representative sessions:

#Cori (session from test data)
generate_dotplot(1, "Cori")

#Forssman
generate_dotplot(6, "Forssman")

#Hench
generate_dotplot(10, "Hench")

#Lederberg (session from test data)
generate_dotplot(18, "Lederberg")

Cori and Forssman both seem to have higher success rate when their experiment had higher left contrast and lower right contrast, which can be seen through the dots on the bottom right of their graphs being very light colored. This can indicate that Cori and Forssman both are more keen to stimuli on the left side, and tend to turn the wheel correctly when the left stimuli is stronger.

On the other hand, Hench and Lederberg show lighter color dots when the value of the right contrast is 0. This can indicate that Cori and Forssman both tend to have higher success rates when there is no stimuli from the right side.

I wrapped my earlier code creating a visualization of the neuron spikes counts in each area of the brain arcoss sessions in a function so that I can apply this function in different sessionsin our experiment.

plot_session <- function(session_number) {
  trial_summary <- generate_trial_summary(session[[session_number]])
  n.trial <- length(trial_summary$id)
  n.area <- ncol(trial_summary) - 4
  area.col <- rainbow(n = n.area, alpha = 0.7)
  
  # Find the minimum and maximum values of the trial summary data
  min_value <- min(trial_summary[, 1:n.area])
  max_value <- max(trial_summary[, 1:n.area])
  
  # Adjust the y-axis limits based on the data range
  ylim <- c(min_value - 0.1 * abs(min_value), max_value + 0.1 * abs(max_value))
  
  plot(0 ~ 1, col = 'white', xlim = c(0, n.trial), ylim = ylim, xlab = "Trials", ylab = "Average spike counts", main = paste("Spikes per area in Session", session_number))
  
  for (i in 1:n.area) {
    lines(y = trial_summary[[i]], x = trial_summary$id, col = area.col[i], lty = 2, lwd = 1)
    lines(smooth.spline(trial_summary$id, trial_summary[[i]]), col = area.col[i], lwd = 3)
  }
  
  legend("topright", 
         legend = colnames(trial_summary)[1:n.area], 
         col = area.col, 
         lty = 1, 
         cex = 0.8
  )
}

Now I use this function to obtain a visualization of the average spikes per area across trials in the same representative sessions of each of the 4 mice.

#Cori (session from test data)
plot_session(1)

#Forssman
plot_session(6)

#Hench
plot_session(10)

#Lederberg (session from test data)
plot_session(18)

I observe that across all neuron areas, the spiking activity of Cori and Hench are relatively constant across trials while Hench and Lederberg have several ups and downs in their spiking activity across trials. This could indicate the difference in sensitivity of the mice. Cori and Forssman are relatively calm and less sensitive to stimuli, hence would show less neuron activity in response to stimuli. On the other hand, Hench and Lederberg are more sensitive to stimuli, hence would more neuron activity in response to stimuli.

#also put observation on the up and down trends I also observe that while Cori and Hench’s spike counts for all of their active brain areas are relatively constant with no drastic ups and downs, the value of the average spike counts slowly decline as the trials go on. This trend indicates a tendency for both Cori and Hench to get fatigued as the experiment goes on. The neurons in their active brain areas become less responsive as the trials go on, hence show lower average spike counts.

On the other hand, Hench and Lederberg’s spike count trend for many of their active brain area illustrate a mellow hill-like shape (spike counts slowly rise in the first half of their trials, then decline in the second half of their trials). This trend indicates a tendency for both Hench and Lederberg to slowly become more engaged in the experiment in the middle of their session, hence the neuron spiking activity in the middle of their session are the most active. Then the mice slowly lose engagement in the experiment toward the latter half of their trials, hence the neurons in many areas of their brain become less active/responsive to stimuli.

Since I have already discovered trends between mice in activity in specifically the root area earlier in my data exploration section, I will now make a visualization summarizing the spike trends in the root area across all sessions that each mouse participated in. The “root” area is also one of the only brain areas that all 4 mice had nueron spiking activity in their representative sessions. Therefore, it would be logical to observe how the spiking activity in this area have heterogenity/homogenity between the mice.

#root area

plot_sessions <- function(mouse_name, start_session, end_session) {
  # Initialize variables
  area_col <- "root"  # Specify the column name for the desired area
  col <- colorRampPalette(c("lightblue", "darkblue"))(end_session - start_session + 1)  # Generate a sequence of progressively darker blue colors
  
  # Set x-axis and y-axis limits
  xlim <- c(0, 250)
  ylim <- c(0, 5)
  
  # Create an empty plot
  plot(0 ~ 1, col = 'white', xlim = xlim, ylim = ylim, xlab = "Trials", ylab = "Average spike counts", main = paste("Spikes in", area_col, "- ", mouse_name))
  
  for (session_number in start_session:end_session) {
    # Generate trial summary data for the session
    trial_summary <- generate_trial_summary(session[[session_number]])
    
    # Get the index of the current session
    session_index <- session_number - start_session + 1
    
    # Plot session data for the specified area
    lines(y = trial_summary[[area_col]], x = trial_summary$id, col = col[session_index], lty = ifelse(session_index == 1, 1, 2), lwd = 2)
    lines(smooth.spline(trial_summary$id, trial_summary[[area_col]]), col = col[session_index], lwd = 3)
    
    # Add session label
    text(x = trial_summary$id[1], y = trial_summary[[area_col]][1], labels = paste("Session", session_number), pos = 3)
  }
  
  # Add legend
  legend("topright", 
         legend = paste("Session", start_session:end_session), 
         col = col, 
         lwd = c(2, 3), 
         cex = 0.8
  )
}

# Call the plot_sessions function for each mouse

# Cori: sessions 1-3
plot_sessions("Cori", 1, 3)

# Forssman: sessions 4-7
plot_sessions("Forssman", 4, 7)

# Hench: sessions 8-11
plot_sessions("Hench", 8,11)

# Lederberg: sessions 12-18
plot_sessions("Lederberg", 12, 18)

We can see that the neuron spiking activity in the root area show very similar trends to our earlier visualization of spiking activity all brain areas, specifically in the homogenity/heterogenity between the 4 mice. The spiking activity for Cori and Forssman are relatively constant with not as many ups and down across trials, while Hench and Lederberg show dramatic ups and downs in their spiking activity through the trials.

Therefore I would say that the spiking activity in the root area would be an acceptable representative variable representing each mouse’s spiking activity across all brain areas. Hence ‘root’ will be the brain area I will be using as a predictive variable in my prediction model. I will also use the left and right contrast values as the other 2 predictor variables. This is because I was also able to extract trends in the relationship between contrast values and feedback types in my previous observation, sepcifically the homogenity in those trends between Cori/Forssmann qnd Hench/Lederberg.

Predictive Modeling

Because I was able to spot a clear similarity between the data trends of Cori/Forssman and Hench/Lederberg in my data integration section, I would like to split our experimental data into 2 parts to run my prediction model separately. The first dataframe will contain the sessions that Cori and Forssman participated, which will be used to build a prediction model for the first test data from session 1, since session 1 was experimented on Cori. The second dataframe will contain the sessions that Hench and Lederberg participated, which will be used to build a prediction model for the first test data from session 18 since session 18 was experimented on Lederberg.

# Iterate over sessions 1-18 and convert trial summaries to data frames
for (i in 1:18) {
  # Generate the trial summary for the current session
  trial_summary <- generate_trial_summary(session[[i]])
  
  # Assign the trial summary to a unique data frame name
  assign(paste0("session", i), trial_summary)
}
library(dplyr)

# Combine data frames from sessions 1-7
df_combined_1 <- bind_rows(session1, session2, session3, session4, session5, session6, session7)
head(df_combined_1)
## # A tibble: 6 × 36
##     ACA   CA3    DG    LS   MOs  root   SUB  VISp feedback `left contr.`
##   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>         <dbl>
## 1 1.27   1.53  2.32 1.94  0.938 0.5    2.08  1.69        1           0  
## 2 0.587  1.71  1.56 1.37  0.779 0.667  1.95  1.65        1           0  
## 3 1.19   1.99  3.41 1.73  0.920 1.44   2.64  2.28       -1           0.5
## 4 0.514  2.19  2.21 1.06  0.708 1.06   2.8   1.56       -1           0  
## 5 0.771  1.81  2.06 1.30  0.788 0.778  2.61  1.62       -1           0  
## 6 0.459  1.85  1.06 0.906 0.735 0.611  1.91  1.28        1           0  
## # ℹ 26 more variables: `right contr.` <dbl>, id <dbl>, CA1 <dbl>, POST <dbl>,
## #   VISl <dbl>, VISpm <dbl>, LP <dbl>, MG <dbl>, MRN <dbl>, NB <dbl>,
## #   SPF <dbl>, VISam <dbl>, LGd <dbl>, LSr <dbl>, TH <dbl>, VISa <dbl>,
## #   VPL <dbl>, OLF <dbl>, ORB <dbl>, PL <dbl>, AUD <dbl>, SSp <dbl>, CP <dbl>,
## #   EPd <dbl>, LD <dbl>, PIR <dbl>
# Combine data frames from sessions 8-18
df_combined_2 <- bind_rows(session8, session9, session10, session11, session12, session13, session14, session15, session16, session17, session18)
head(df_combined_2)
## # A tibble: 6 × 57
##     CA1   CA3    DG   ILA    LD    LP   LSr   MOs    PL    PO  root   SUB    TT
##   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.679  1.35 0.818 1.01   1.46  1.4   0    0.513  1.01  1.37  0    1.26  0.646
## 2 0.982  2.06 1.30  0.951  1.46  1.42  1    0.876  1.19  1.49  0    0.794 0.841
## 3 1      1.82 0.848 1.22   1.32  1.26  1    0.735  1.41  1.42  0    1.03  0.720
## 4 2.20   1.65 1.33  2.08   2.15  2.11  2.33 1.27   2.26  2.81  0.25 2.47  1.49 
## 5 1.02   1.65 1.48  1.03   1.44  1.22  1.67 0.894  1.24  1.96  0    1.32  1.01 
## 6 0.964  2.24 0.697 1.15   1.80  1.75  2.33 0.850  1.06  1.96  0    1.56  0.610
## # ℹ 44 more variables: VISa <dbl>, VISp <dbl>, feedback <dbl>,
## #   `left contr.` <dbl>, `right contr.` <dbl>, id <dbl>, ORBm <dbl>, TH <dbl>,
## #   VISam <dbl>, VISl <dbl>, VPL <dbl>, GPe <dbl>, MB <dbl>, MRN <dbl>,
## #   POL <dbl>, POST <dbl>, SCm <dbl>, SCsg <dbl>, VISrl <dbl>, CP <dbl>,
## #   LSc <dbl>, MOp <dbl>, PT <dbl>, ACA <dbl>, LGd <dbl>, LH <dbl>, MD <dbl>,
## #   MS <dbl>, RN <dbl>, SCs <dbl>, ZI <dbl>, ORB <dbl>, PAG <dbl>, RSP <dbl>,
## #   BLA <dbl>, VPM <dbl>, SSp <dbl>, SSs <dbl>, MEA <dbl>, RT <dbl>, …

I decided to use Linear Discrimination Analysis on my 3 chosen variables (root, values of left and right contrast) to build my prediction model because all of my predictor varaibles are continuous and I would assume that the variables follow a normal distribution. I also did not find any specific outliers trials throughout my data analysis, which indicates that a predictive model focusing on all data points would be effective. Therefore, LDA would be an effective prediction model to provide a prediction for my response variable (feedback).

I now ran LDA separately on my 2 data frames to separately predict the outcomes in the test data from session 1 and session 18.

# Load the necessary package
library(MASS)
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select
# Step 1: Prepare the training data
train_data1 <-  df_combined_1
train_data2 <- df_combined_2
# Step 2: Train the LDA model
lda_model1 <- lda(feedback ~ root + `left contr.` + `right contr.`, data = train_data1)
lda_model2 <- lda(feedback ~ root + `left contr.` + `right contr.`, data = train_data2)
# Print the summary of the LDA model
summary(lda_model1)
##           Length Class  Mode     
## prior       2    -none- numeric  
## counts      2    -none- numeric  
## means       6    -none- numeric  
## scaling     3    -none- numeric  
## lev         2    -none- character
## svd         1    -none- numeric  
## N           1    -none- numeric  
## call        3    -none- call     
## terms       3    terms  call     
## xlevels     0    -none- list     
## na.action 249    omit   numeric
summary(lda_model2)
##           Length Class  Mode     
## prior       2    -none- numeric  
## counts      2    -none- numeric  
## means       6    -none- numeric  
## scaling     3    -none- numeric  
## lev         2    -none- character
## svd         1    -none- numeric  
## N           1    -none- numeric  
## call        3    -none- call     
## terms       3    terms  call     
## xlevels     0    -none- list     
## na.action 280    omit   numeric

Prediction performance on the test sets:

Discussion: